See copyright notice at the bottom of this page.
List of All Posters
DIPS year-to-year correlations, 1972-1992 (August 5, 2003)
Discussion ThreadPosted 7:43 p.m.,
August 13, 2003
(#86) -
Chris R
Erik -
Is this an accurate synopsis of you ran the simulations in Case 2 of post #81:
a) For each pitcher, randomly select a park factor. This park factor is normally distributed about 0 with a standard deviation .004.
b) For each pitcher, then randomly select a defense factor. This factor is normally distributed about 0 with a standard deviation of .008.
c) Randomly select a (H-HR)/BIP talent value for each pitcher. These values are normally distributed about the sample mean, and have a standard deviation equal to the stdev value being evaluated.
d) Sum the park factor, defence factor, and talent value for each pitcher, then conduct 250-750 bernoulli trials with E(X) = park + defense + pitcher for each pitcher.
e) Compare the observed standard deviation of (H-HR)/BIP in the simulation to the observered historical value. If the stdev numbers match up, the talent deviation number is assumed to be close to the actual MLB value.
If I have missed anything here, I'd love to hear about it. I have some ideas about this, but I'll keep my mouth shut until I can be reasonably certain I know what I'm talking about.
DIPS year-to-year correlations, 1972-1992 (August 5, 2003)
Posted 8:01 p.m.,
August 13, 2003
(#87) -
Chris R
Oh, and I apologize for not mentioning in my first post that this is great work that has been done so far.
DIPS year-to-year correlations, 1972-1992 (August 5, 2003)
Posted 8:37 p.m.,
August 13, 2003
(#88) -
Chris R
One thing that has been considered is that parks and defences affect different pitchers differently. I believe you can account for this without complicating the simulation. The defence and park factor standard deviations currently being used seem to be on a team level. The standard deviations for those values (currently .008 and .004) should be calculated at the pitcher level, and should be higher as such.
The park factor deviation could be estimated using the standard deviation of pitcher park factors (Home BABIP - Road BABIP) in the sample set. Unfortunately, I don't have a play by play database built, so I can't determine this number.
The pitcher specific defence deviation is not easy to determine, but we should keep in mind that it will be higher than the overall devence deviation.
One other thing you might consider doing is removing the 45-odd Charlie Hough, Joe Niekro, and Phil Niekro seasons from the data set. It is fairly well accepted that knuckleballers differ from other pitchers on BIP averages, and they are easily identifiable. While they make up a tiny number of the pitchers in the sample, their numbers probably inflate the sample stdev by significant amount.
DIPS year-to-year correlations, 1972-1992 (August 5, 2003)
Posted 1:50 a.m.,
August 14, 2003
(#91) -
Chris R
I figure we should be able to calculate, rather than simulate what our sample variance _should_ be.
I had considered this, but I don't think it is a problem. I certainly think it would be interesting to calculate, rather than simulate, what the pitcher variance should be, but if the simulation is done correctly, the answers should be very similar. Eliminating the simulation variance would help, but the dependancy between park and defence factors, and the lack of a good estimate of defence variance relative to pitchers are larger issues.
DIPS year-to-year correlations, 1972-1992 (August 5, 2003)
Posted 2:22 a.m.,
August 14, 2003
(#93) -
Chris R
I've mentioned defence variance relative to pitcher a couple of times now, and I am not sure I have explained clearly what I mean by this.
Each of the data points used to determine the defence stdev number of .008 used in the most recent simulation represents the performance of a single team's defence over 162 games. However, it is being used to estimate the performance of a defence behind a single pitcher over the course of a season. Even if every defence treated all pitchers equally, the defence-pitcher variance would be larger than the defence-team variance because of the smaller sample sizes involved. Combined with the fact that we can reasonably assume that defences do not treat all pitchers equally, the defence-pitcher variance should be even larger yet.
I don't see a simple way to estimate defence-pitcher variance, but I have an idea for moving closer to it. Erik's most recent estimation of .007 for pitcher-talent stdev should represent an upper bound for the actual value of pitcher-talent stdev. With that, you could calculate an observed BABIP variance for teammates, then turn the simulation around and calculate a lower bound for intra-team defensive variance. That number could then be combined with the current estimate for inter-team defensive variance to produce a new lower bound for pitcher-team variance. Repeat the process a couple more times, and you might have a better simulation.
Of course, there might be a closed form solution for determining these values, but I won't hold my breath.
DIPS year-to-year correlations, 1972-1992 (August 5, 2003)
Posted 2:54 a.m.,
August 14, 2003
(#97) -
Chris R
For the sake of continuity, I'll start spelling defense like everyone else.
Defense-team variance (.008) is what you have called alpha-defense (It sure would be nice to have a greek keyboard right now).
Defense-team = inter-team variance,
Defense-pitcher = f(Var(inter-team),Var(intra-team))
Unfortunately, I'm not sure what f is.
Var(intra-team) = sum over i (pitcher-i $H - team $H)^2 /(n-1)
Basically, a pitcher's results will differ from his teammates results, and his defense will differ from other team's defenses. These two degrees of separation from the average pitcher could be estimated by a single random variable.
BTW, I envy your access to matlab. I'm stuck here contemplating writing simulations in java.